70 research outputs found
Adversarial Fine-tuning using Generated Respiratory Sound to Address Class Imbalance
Deep generative models have emerged as a promising approach in the medical
image domain to address data scarcity. However, their use for sequential data
like respiratory sounds is less explored. In this work, we propose a
straightforward approach to augment imbalanced respiratory sound data using an
audio diffusion model as a conditional neural vocoder. We also demonstrate a
simple yet effective adversarial fine-tuning method to align features between
the synthetic and real respiratory sound samples to improve respiratory sound
classification performance. Our experimental results on the ICBHI dataset
demonstrate that the proposed adversarial fine-tuning is effective, while only
using the conventional augmentation method shows performance degradation.
Moreover, our method outperforms the baseline by 2.24% on the ICBHI Score and
improves the accuracy of the minority classes up to 26.58%. For the
supplementary material, we provide the code at
https://github.com/kaen2891/adversarial_fine-tuning_using_generated_respiratory_sound.Comment: accepted in NeurIPS 2023 Workshop on Deep Generative Models for
Health (DGM4H
Self-Contrastive Learning: Single-viewed Supervised Contrastive Framework using Sub-network
Contrastive loss has significantly improved performance in supervised
classification tasks by using a multi-viewed framework that leverages
augmentation and label information. The augmentation enables contrast with
another view of a single image but enlarges training time and memory usage. To
exploit the strength of multi-views while avoiding the high computation cost,
we introduce a multi-exit architecture that outputs multiple features of a
single image in a single-viewed framework. To this end, we propose
Self-Contrastive (SelfCon) learning, which self-contrasts within multiple
outputs from the different levels of a single network. The multi-exit
architecture efficiently replaces multi-augmented images and leverages various
information from different layers of a network. We demonstrate that SelfCon
learning improves the classification performance of the encoder network, and
empirically analyze its advantages in terms of the single-view and the
sub-network. Furthermore, we provide theoretical evidence of the performance
increase based on the mutual information bound. For ImageNet classification on
ResNet-50, SelfCon improves accuracy by +0.6% with 59% memory and 48% time of
Supervised Contrastive learning, and a simple ensemble of multi-exit outputs
boosts performance up to +1.5%. Our code is available at
https://github.com/raymin0223/self-contrastive-learning.Comment: AAAI 202
Fine-Tuning the Retrieval Mechanism for Tabular Deep Learning
While interests in tabular deep learning has significantly grown,
conventional tree-based models still outperform deep learning methods. To
narrow this performance gap, we explore the innovative retrieval mechanism, a
methodology that allows neural networks to refer to other data points while
making predictions. Our experiments reveal that retrieval-based training,
especially when fine-tuning the pretrained TabPFN model, notably surpasses
existing methods. Moreover, the extensive pretraining plays a crucial role to
enhance the performance of the model. These insights imply that blending the
retrieval mechanism with pretraining and transfer learning schemes offers
considerable potential for advancing the field of tabular deep learning.Comment: Table Representation Learning Workshop at NeurIPS 202
Explaining Convolutional Neural Networks through Attribution-Based Input Sampling and Block-Wise Feature Aggregation
As an emerging field in Machine Learning, Explainable AI (XAI) has been
offering remarkable performance in interpreting the decisions made by
Convolutional Neural Networks (CNNs). To achieve visual explanations for CNNs,
methods based on class activation mapping and randomized input sampling have
gained great popularity. However, the attribution methods based on these
techniques provide lower resolution and blurry explanation maps that limit
their explanation power. To circumvent this issue, visualization based on
various layers is sought. In this work, we collect visualization maps from
multiple layers of the model based on an attribution-based input sampling
technique and aggregate them to reach a fine-grained and complete explanation.
We also propose a layer selection strategy that applies to the whole family of
CNN-based models, based on which our extraction framework is applied to
visualize the last layers of each convolutional block of the model. Moreover,
we perform an empirical analysis of the efficacy of derived lower-level
information to enhance the represented attributions. Comprehensive experiments
conducted on shallow and deep models trained on natural and industrial
datasets, using both ground-truth and model-truth based evaluation metrics
validate our proposed algorithm by meeting or outperforming the
state-of-the-art methods in terms of explanation ability and visual quality,
demonstrating that our method shows stability regardless of the size of objects
or instances to be explained.Comment: 9 pages, 9 figures, Accepted at the Thirty-Fifth AAAI Conference on
Artificial Intelligence (AAAI-21
Patch-Mix Contrastive Learning with Audio Spectrogram Transformer on Respiratory Sound Classification
Respiratory sound contains crucial information for the early diagnosis of
fatal lung diseases. Since the COVID-19 pandemic, there has been a growing
interest in contact-free medical care based on electronic stethoscopes. To this
end, cutting-edge deep learning models have been developed to diagnose lung
diseases; however, it is still challenging due to the scarcity of medical data.
In this study, we demonstrate that the pretrained model on large-scale visual
and audio datasets can be generalized to the respiratory sound classification
task. In addition, we introduce a straightforward Patch-Mix augmentation, which
randomly mixes patches between different samples, with Audio Spectrogram
Transformer (AST). We further propose a novel and effective Patch-Mix
Contrastive Learning to distinguish the mixed representations in the latent
space. Our method achieves state-of-the-art performance on the ICBHI dataset,
outperforming the prior leading score by an improvement of 4.08%.Comment: INTERSPEECH 2023, Code URL:
https://github.com/raymin0223/patch-mix_contrastive_learnin
- …